Check the uwul/data/base.py for the abstraction of dataset/dataset factory
Or refer to guanaco dataset for more details